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Abstract 

In this paper, we consider tiie problem of sciieduling an application on a par- 
allel computational platform. The application is a particular task graph, either a 
linear chain of tasks, or a set of independent tasks. The platform is made of iden- 
tical processors, whose speed can be dynamically modified. It is also subject to 
failures: if a processor is slowed down to decrease the energy consumption, it has 
a higher chance to fail. Therefore, the scheduling problem requires to re-execute 
or replicate tasks (i.e., execute twice a same task, either on the same processor, 
or on two distinct processors), in order to increase the reliability. It is a tri-criteria 
problem: the goal is to minimize the energy consumption, while enforcing a bound 
on the total execution time (the makespan), and a constraint on the reliability of 
each task. 

Our main contribution is to propose approximation algorithms for these partic- 
ular classes of task graphs. For linear chains, we design a fully polynomial time 
approximation scheme. However, we show that there exists no constant factor ap- 
proximation algorithm for independent tasks, unless P=NP, and we are able in this 
case to propose an approximation algorithm with a relaxation on the makespan 
constraint. 

1 Introduction 



Energy-awareness is now recognized as a first-class constraint in the design of new 
scheduling algorithms. To help reduce energy dissipation, current processors from 
AMD, Intel and Transmetta allow the speed to be set dynamically, using a dynamic 
voltage and frequency scaling technique (DVFS). Indeed, a processor running at speed 
s dissipates s"^ watts per unit of time Q. However, it has been recognized that reducing 
the speed of a processor has a negative effect on the reliability of a schedule: if a 
processor is slowed down, it has a higher chance to be subject to transient failures, 
caused for instance by software errors ll20l[m . 

Motivated by the application of speed scaling on large scale machines fTSl, we 
consider a tri-criteria problem energy/reliability/makespan: the goal is to minimize the 
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energy consumption, while enforcing a bound on the makespan, i.e., the total execution 
time, and a constraint on the reliability of each task. The application is a particular task 
graph, either a linear chain of tasks, or a set of independent tasks. The platform is made 
of identical processors, whose speed can be dynamically modified. 

In order to make up for the loss in reliability due to the energy efficiency, we con- 
sider two standard techniques: re-execution consists in re-executing a task twice on a 
same processor ll20l [191 , while replication consists in executing a same task on two 
distinct processors simultaneously f2l. We do not consider checkpointing, which con- 
sists in "saving" the work done at some points, hence reducing the amount of work lost 
when a failure occurs lfl4l [TSl . 

The schedule therefore requires to (i) decide which tasks are re-executed or repli- 
cated; (ii) decide on which processor(s) each task is executed; (iii) decide at which 
speed each processor is processing each task. For a given schedule, we can compute 
the total execution time, also called makespan, and it should not exceed a prescribed 
deadline. Each task has a reliability that can be computed given its execution speed 
and its eventual replication or re-execution, and we must enforce that the execution of 
each task is reliable enough. Finally, we aim at minimizing the energy consumption. 
Note that we consider a set of homogeneous processors, but each processor may run at 
a different speed; this corresponds to typical current platforms with DVFS. 

Related work. The problem of minimizing the energy consumption without exceed- 
ing a given deadline, using DVFS, has been widely studied, without accounting for 
reliability issues. The problem for a linear chain of tasks is known to be solvable in 
polynomial time in this case, see |3|. 11] showed that the problem of scheduling in- 
dependent tasks can be approximated by a factor (1 + e): they exhibit a polynomial 
time approximation scheme (PTAS). |9| studied the performance of greedy algorithms 
for the problem of scheduling independent tasks, with the objective of minimizing the 
energy consumption, and proposed some approximation algorithms. 

All these work do not account for reliability issues. However, f20l showed that 
reducing the speed of a processor increases the number of transient failure rates of the 
system; the probability of failures increases exponentially, and this probabiUty cannot 
be neglected in large-scale computing |T5'|. Few authors have tackled the tri-criteria 
problem including reliability, and to the best of our knowledge, there are no approx- 
imation algorithms for this problem. ||T9ll initiated the study of this problem, using 
re-execution. However, they restrict their study to the scheduling problem on a sin- 
gle processor, and do not try to find any approximation ratio on their algorithm. IJ] 
have recently proposed an off-line tri-criteria scheduling heuristic (TSH), which uses 
replication to minimize the makespan, with a threshold on the global failure rate and 
the maximum power consumption. TSH is an improved critical-path list scheduling 
heuristic that takes into account power and reliability before deciding which task to 
assign and to replicate onto the next free processors. However, the complexity of this 
heuristic is unfortunately exponential in the number of processors, and the authors did 
not try to give an approximation ratio on their heuristic. Finally, also study the 
tri-criteria problem, but from an heuristic point of view, without trying to ensure any 
approximation ratio on their heuristics. Moreover, they do not consider replication of 
tasks, but only re-execution as in lfT9l . However, they present a formal model of the 
tri-criteria problem, re-used in this paper. 

Finally, there is some related work specific to the problem of independent tasks, 
since several approximation algorithms have been proposed for variants of the problem. 
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One may try to minimize the tk norm, i.e., the quantity (X]g=i(X]ie/oa(i(g) 
with p processors, where i e load{q) means that task Ti is assigned to processor q, and 
tti is the weight of task Ti UJ . Minimizing the power consumption then amounts to 
minimize the £3 norm ||9l, and the problem of makespan minimization is equivalent to 
minimizing the t^o norm, i.e., minimize maxi<g<p J2ieioad{q) fTT 'Sl. These prob- 
lems are typical load balancing problems, in which the load (computation requirement 
of the tasks) must be balanced between processors, according to various criteria. 

Main contributions. In this paper, we investigate the tri-criteria problem of mini- 
mizing the energy with a bound on the makespan and a constraint on the reliability. 
First in Section[2] we formally introduce this tri-criteria scheduling problem, based on 
the previous models proposed by [19] and \4}. To the best of our knowledge, this is the 
first model including both re-execution and replication in order to deal with failures. 
The main contribution of this paper is then to provide approximation algorithms for 
some particular instances of this tri-criteria problem. 

For linear chains of tasks, we propose a fully polynomial time approximation 
scheme (Section |3]l. Then in Section |4] we show that there exists no constant fac- 
tor approximation algorithm for the tri-criteria problem with independent tasks, unless 
P=NP. We prove that by relaxing the constraint on the makespan, we are able to give 
a polynomial time constant factor approximation algorithm. To the best of our knowl- 
edge, these are the first approximation algorithms for the tri-criteria problem. 

2 Framework 

Consider an application task graph Q = {V, £), where V = {Ti, T2, . . . , Tn} is the 
set of tasks, n — \V\, and where £ is the set of precedence edges between tasks. For 
1 < i < n, task Ti has a weight Wi, that corresponds to the computation requirement 
of the task. S — X]"=i "^i '^^e sum of the computation requirements of all tasks. 

The goal is to map the task graph onto p identical processors, with the objective 
of minimizing the total energy consumption, while enforcing a bound on the total ex- 
ecution time (makespan), and matching a reliability constraint. Processors can have 
arbitrary speeds, determined by their frequency, that can take any value in the interval 
[/minj /max] (dynamic voltage and frequency scaling with continuous speeds). Higher 
frequencies, and hence faster speeds, allow for a faster execution, but they also lead to 
a much higher (supra-linear) power consumption. Moreover, reducing the frequency of 
a processor increases the number of transient failures of the system. Therefore, some 
tasks are executed once at a speed high enough to satisfy the reliability constraint, while 
some other tasks are executed several times (either on the same processor, or on differ- 
ent processors), at a lower speed. We detail below the conditions that are enforced on 
the corresponding execution speeds. The problem is therefore to decide which tasks 
should be executed several times, on which processor, and at which speed to run each 
execution of a task, as well as the schedule, i.e., in which order the tasks are executed 
on each processor. Note that |4| showed that it is always better to execute a task at a 
single speed, and therefore we assume in the following that each execution of a task is 
done at a single speed. 

We now detail the three objective criteria (makespan, reliability, energy), and then 
define formally the problem. 
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2.1 Makespan 

The makespan of a schedule is its total execution time. The first task is scheduled at 
time 0, so that the makespan of a schedule is simply the maximum time at which one 
of the processors finishes its computations. Given a schedule, the makespan should not 
exceed the prescribed deadline D. 

Let £xe{wi, /) be the execution time of a task Ti of weight Wi at speed /. We 
assume that the cache size is adapted to the application, therefore ensuring that the 
execution time is linearly related to the frequency |14||: £xe{wi, f) = y-. Note that 
we consider a worst-case scenario, and the deadline D must be matched even in the 
case where all tasks that are scheduled to be executed several times fail during their 
first executions, hence all execution times for a same task should be accounted for 

2.2 Reliability 

To define the reliability, we use the failure model of ll20ll and (T9\. Transient failures 
are failures caused by software errors for example. They invalidate only the execu- 
tion of the current task and the processor subject to that failure will be able to recover 
and execute the subsequent tasks assigned to it (if any). In addition, we use the re- 
liability model introduced by [17|, which states that the radiation-induced transient 
failures follow a Poisson distribution. The parameter A of the Poisson distribution is 

J /max - / 

then A(/) = Aq e /max-/,„i„ ^ where /min < / < /max is the processing speed, the 
exponent c? > is a constant, indicating the sensitivity of failure rates to dynamic volt- 
age and frequency scaling, and Ao is the average failure rate at speed /max- We see 
that reducing the speed for energy saving increases the failure rate exponentially. The 
rehabihty of a task Ti executed once at speed / is 

Because the failure rate Aq is usually very small, of the order of 10~^ per time unit 1*21, 
or even 10~^ ||7][T6l, we can use the first order approximation of Ri{f ) as 

i?.(/) = l 
= 1 

= 1 

where d — j — and Ao — Xqc'^-I'""^^ . 

Note that this equation holds if Si = A(/) x ^ <^ I. With, say, A(/) = 10~^, 
we need < 10"^ to get an accurate approximation with < 0.01: the task should 
execute within 16 minutes. In other words, large (computationally demanding) tasks 
require reasonably high processing speeds with this model (which makes full sense in 
practice). 

We want the reliability Ri of each task Ti to be greater than a given threshold, 
namely Ri{f rei)^ hence enforcing a local constraint dependent on the task: Ri > 
Ri{f rei)- If task Ti is executed only once at speed /, then the reliability of Ti is 
Ri = Ri{f). Since the reliability increases with speed, we must have / > /j-ei to 
match the reliability constraint. If task Ti is executed twice (speeds /(^' and /'^-'), 
then the execution of Ti is successful if and only if one of the attempts do not fail, so 



- A(/) X £xe{w.„f ) 

~ /max-/ Wi 

- Ao e /max-/mi„ X — 

\ -df '^i 

- Ao e "-^ X — , 
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that the reHabiHty of is i?,; = 1 - (1 - - and this quantity 

should be at least equal to i?i(/rei)- 

We restrict in this work to a maximum of two executions of a same task, either on 
the same processor (what we call re-execution), or on two distinct processors (what we 
call replication). This is based on the following observation on the two cases in which 
a third execution of a task may be useful. 

1 . The deadline is such that even if all tasks are executed twice at the slowest possi- 
ble speed, the execution time is still lower than the deadline. Then, the problem 
is to decide which task should be executed three times, and it is quite similar to 
the problem that we discuss in this paper 

2. Some tasks are too big to be re-executed while there remains some time such that 
some small tasks can be executed at least three times at a speed even slower In 
this case, the gain in energy consumption is negligible compared to the energy 
consumption of the big tasks at speed /rei- 

Note that if both execution speeds are equal, i.e., /^^^ = f^^'> = /, then the relia- 
bility constraint writes 1 — (AoWi^-^)^ > i?i(/rei), and therefore 

Ao^t < — . 



In the following, /inf,i is the solution to the equation AoWi^j— pyy- = ^-j—f-, and 
hence task Ti can be executed twice at a speed greater than or equal to fini,i while 
meeting the reliability constraint. In practice, /inf.i is small enough so that tasks are 
usually executed faster than this speed, hence reinforcing the argument that it is mean- 
ingful to restrict to two executions of a same task. 

2.3 Energy 

The total energy consumption corresponds to the sum of the energy consumption of 
each task. Let Ei be the energy consumed by task Ti. For one execution of Ti at 
speed /, the corresponding energy consumption is — £xe{wi, f)x p = WiXf^, 

which corresponds to the dynamic part of the classical energy models of the literature 
EH). Note that we do not take static energy into account, because all processors are 
up and aUve during the whole execution. 

If task Ti is executed only once at speed /, then Ei ~ Ei{f). Otherwise, if task Ti 
is executed twice at speeds /'^^ and f^^\ it is natural to add up the energy consumed 
during both executions, just as we consider both execution times when enforcing the 
deadline on the makespan. Again, this corresponds to the worst-case execution sce- 
nario. We obtain E, = E,{f^^^)+E,{fl^^). Note that some authors 1 19| consider only 
the energy spent for the first execution in the case of re-execution, which seems unfair: 
re-execution comes at a price both in the makespan and in the energy consumption. 
Finally, the total energy consumed by the schedule, which we aim at minimizing, is 

2.4 Optimization problem 

Given an application graph Q — {V, £) and p identical processors, Tri-Crit is the 
problem of finding a schedule that specifies which tasks should be executed twice, on 
which processor and at which speed each execution of a task should be processed, such 
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that the total energy consumption E is minimized, subject to the deadhne D on the 
makespan and to the local reliability constraints Ri > -R,;(/rei) for each Ti E V. 

We focus in this paper on the two following sub-problems that are restrictions of 
Tri-Crit to special application graphs: 

• Tri-Crit-Chain: the graph is such that 

s = u^ri^jT, -> r,;+i}; 

• Tri-Crit-Indep: the graph is such that f = 0. 

3 Linear chains 

In this section, we focus on the Tri-Crit-Chain problem, that was shown to be NP- 
hard even on a single processor |4|. We derive an FPTAS (Fully Polynomial Time 
Approximation Scheme) to solve the general problem with replication and re-execution 
on p processors. We start with some preliminaries in Section 13.11 that allow us to 
characterize the shape of an optimal solution, and then we detail the FPTAS algorithm 
and its proof in Section l372l 

3.1 Characterization 

First, we note that while Tri-Crit-Chain is NP-hard even on a single processor, the 
problem has polynomial complexity if no replication nor re-execution can be used. 
Indeed, each task is executed only once, and the energy is minimized when all tasks 
are running at the same speed. Note that this result can be found in 

Lemma 1. Without replication or re-execution, solving Tri-Crit-Chain can be done 
in polynomial time, and each task is executed at speed max [frei, 

Proof. For a linear chain of tasks, all tasks can be mapped on the same processor, 
and scheduled following the dependencies. No task may start earlier by using another 
processor, and all tasks run at the same speed. Since there is no replication nor re- 
execution, each task must be executed at least at speed /rei for the reliability constraint. 
If S/ frei > D, then the tasks should be executed at speed S/D so that the deadline 
constraint is matched (recall that S = X]"=i ^i)' hence the result. □ 

Next, accounting for replication and re-execution, we characterize the shape of 
an optimal solution. For linear chains, it turns out that with a single processor, only 
re-execution will be used, while with more than two processors, there is an optimal 
solution that do not use re-execution, but only replication. 

Lemma 2 (Replication or re-execution). When there is only one processor, it is opti- 
mal to only use re-execution to solve Tri-Crit-Chain. When there are at least two 
processors, it is optimal to only use replication to solve Tri-Crit-Chain. 

Proof. With one processor, the result is obvious, since replication cannot be used. With 
more than one processor, if re-execution was used on task Ti, for 1 < i < n, we can 
derive a solution with the same energy consumption and a smaller execution time by 
using replication instead of re-execution. Indeed, all instances of tasks Tj, for j < i, 
must finish before Ti starts its execution, and similarly, all instances of tasks Tj, for 
j > i, cannot start before both copies of Ti has finished its execution. Therefore, there 
are always at least two processors available when executing Ti for the first time, and 



6 



the execution time is reduced when executing both copies of Ti in parallel (replication) 
rather than sequentially (re-execution). □ 



We further characterize the shape of an optimal solution by showing that two copies 
of a same task can always be executed at the same speed. 

Lemma 3 (Speed of the replicas). For a linear chain, when a task is executed two 
times, it is optimal to have both replicas executed at the same speed. 

Proof. The proof for re-execution has been done by iH : by convexity of the energy 
and reliability functions, it is always advantageous to execute two times the task at the 
same speed, even if the application is not a linear chain. 

For replication, this lemma is only true in the case of linear chains. Indeed, because 
of the structure of the chain, as explained in the proof of Lemma |2] both copies of a 
task have the same constraints on starting and ending time, and hence it is better to 
execute them exactly at the same time. □ 

We can further characterize an optimal solution by providing detailed information 
about the execution speed of the tasks, depending whether they are executed only once, 
re-executed, or replicated. 

Proposition 1. If D > then in any optimal solution of Tri-Crit-Chain, all 

Ji-ei 

tasks that are neither re-executed nor replicated are executed at speed jrei- Fur- 
thermore, let Vr Q V be the subset of tasks that are either re-executed or repli- 
cated. Then, these tasks are all executed at the same speed f re-ex, if f re-ex ^ 

max(/min, maXT.GV, /inf,i)- 

Proof. The proof for p = I (re-execution) can be found in fT]. We prove the result 
for p > 2, which corresponds to the case with replication and no re-execution (see 
Lemma|2]). Note first that since D > -p-, if no task is replicated, we have enough time 
to execute all tasks at speed /rei- 

Now, let us consider that task Ti is replicated at speed fi (recall that both replicas 
are executed at the same speed, see Lemma |3]l, and task Tj is executed only once at 
speed fj. Then, we have fj > frei (reliability constraint on Tj), and -^fr^i > fi 
(otherwise, executing Ti only once at speed /^ei would improve both the energy and 
the execution time while matching the reliability constraint). 

If fj > frei, let us show that we can rather execute Tj at speed /rei and Ti at a 
new speed > fi, while keeping the same deadline: jf + = + j^. The 

energy consumption is then 2wifi^ + Wjf"^^^. Moreover, we know that the minimum 
of the function 2wifl + '^jfl^ given that -t- ^ is a constant (where /i and /2 are 
the unknowns), is obtained for /i — i^j^fi (see Theorem 1 by [31). Therefore, if the 
optimal speed of Tj (i.e., /2) is strictly greater than /rei, then the optimal speed for 
Ti is fi=fi = 2573/2 > 2^/2 > 2T72/rei, that means that we can improve both 
energy and execution time by executing Ti only once at speed /rei- Otherwise, the 
speed of Tj is further constrained by /rei, hence the previous inequality (/i = jw/s) 
does not hold anymore, and the function is minimized for /2 = /rei- The value of // 
can be easily deduced from the constraint on the deadline. This proves that all tasks 
that are not replicated are executed at speed /rei- 

Let M ~ max(/niin, maxTigv^ /inf,i)- We now prove that if two tasks are repli- 
cated at a speed greater than M, then both tasks are executed at the same speed. 
Suppose that Ti and Tj are executed twice at speeds fi > fj > M. Let / = 
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fif] wj'^Vw f- ' fi > f > fj ^ M, and therefore we can execute both tasks 

at speed / while keeping the same deadhne and matching the rehability constraints. 
By convexity, such an execution gives a better energy consumption. We can iterate on 
all the tasks that are replicated, hence obtaining the speed at which each task will be 
re-executed, /re-ex- This concludes the proof. □ 

Following Proposition [T] we are able to precisely define /re-ex, and give a closed 
form expression of the energy of a schedule. 

Corollary 1. Given a subset Vr of tasks re-executed or replicated, let X = X^Tey ''^i' 
and 



fr 



max ( /mill: ]^j:^^^_g_i_x frel ) if P — 

X 



max ( /min, D/„i-S+X-^-!^el j if P > 2. 

Then, if f re-ex ^ maxT.gvv f'mf.i, the optimal energy consumption is 

{S^X)fl, + 2Xfl_^^. (1) 

Note that the energy consumption only depends on X, and therefore Tri-Crit- 
Chain is equivalent in this case to the problem of finding the optimal set of tasks that 
have to be re-executed or replicated. 

Proof. Given a deadline D, the problem is to find the set of tasks re-executed (or repli- 
cated), and the speed of each task. Thanks to Proposition [T] we know that the tasks 
that are not in this set are executed at speed /rei, and given the set of tasks re-executed 
or replicated, we can easily compute the optimal speed to execute each task in order 
to minimize the energy consumption: all tasks are executed at the same speed, and we 
have A-F^^^^ h ^7'^ — D, with A = 1 in the case of replication (p > 2), and A = 2 in 

/re-ex /rel 

the case of re-execution (p — 1). Hence the corollary. □ 

Remark. Note that if there is a task Ti E Vr such that fini.i > f re-ex, then the optimal 
solution for this set of replicated tasks is obtained by executing Tj at speed /inf , j, and by 
executing all the other tasks at a new speed f re-ex — f re-ex, such that D is exactly met. 
We can do this recursively until there are no more tasks Ti such that /inf,i > f re-ex- 
Using the procedure COMPUTE_V;(K-) (see Algorithm[T]), we can compute the optimal 
energy consumption in a time polynomial in \ Vr\. 

Let {Vi, f re-ex) be the result of COMPUTE_Vi(K-)- Then the optimal energy con- 
sumption is (S - + Er.ey, '^^^fL^,^ + T,T,evAVi 2«^»/re-ex ■ 

Corollary 2. IfD > -p^, Tri-Crit-Chain can be solved using an exponential time 
exact algorithm. 

Proof. The algorithm computes for every subset Vr of tasks the energy consumption 
if all tasks in this subset are re-executed, and it chooses one with the minimal energy 
consumption, that corresponds to an optimal solution. It takes exponential time to 
compute every subset Vr C V, with |F| = n. □ 

Thanks to Corollary [1] we are also able to identify problem instances that can be 
solved in polynomial time. 

Theorem 1. Tri-Crit-Chain can be solved in polynomial time in the following 
cases: 
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Algorithm 1: Computing re-execution speeds; tasks in Vr are re-executed. 

procedure COMPUTE_V;(yr) 
begin 



AO) _ 

J re-ex - 



max f /mill , f^^j^—S+X /j^el 
max ( /min, 



ifp=l; 
ifp > 2. 



J-0; 

while j = or V;*^'^ 7^ V;^^'"^^ do 
J i + 1; 



0) _ T/0-1) 



J re-ex 



Vi''-'> U {T, e Vr I /inf,» > /M-e'i}; 



max I /mill! 

max /min: 



-11 



2wi 



(3) "'i 



ifp=l; 
ifp> 2. 



return /iil, 



1. D < -p-j- ("no re-execution nor replication); 

2. p = 1, D > -p^, w/iere c /s f/ie onfy positive solution to the polynomial 
7X3 + 21X2-3X- 1 = 0, and hence c = 4y|cosi(7r - tan^^ -i=) - 1 
0.2838), and for 1 < i < n, fiui,i < jj^frei (oil tasks can be re-executed); 

3. p > 2, D > 2j^, and for 1 < i < n, finis < \frei (oil tasks can be 
replicated). 

Proof. First note that when D < -jf—, the optimal solution is to execute each task only 

/rel 

once, at speed since S/D > /rei ■ Indeed, this solution matches both reliability and 
makespan constraints, and it was proven to be the optimal solution in Proposition 2 
by [3 1 (it is easy to see that replication or re-execution would only increase the energy 
consumption). 

Let us now consider that D > -7^ . We aim at showing that the minimum of the 

Jrei 

energy function is reached when the total weight of the re-executed or replicated tasks 
is 

c(-DAei -S) if p = 1; 
(D/rel -S) if p > 2. 

Then necessarily, when this total weight is greater than S, the optimal solution is to 
re-execute or replicate all the tasks. Hence the theorem. We differentiate the two cases 
in the following [p = 1 01 p = 2). 

Case 1 (p — 1). We want to show that the minimum energy is reached when the 
total weight of the subset of tasks is exactly c(£'/rei — S). Let / = {i | T!; is executed 
twice in the solution}, and let X — J2iei ^i- 
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We saw in Corollary [T| that the energy consumption cannot be lower than {S ~ 

^)/rei+2X/^g_gj, where /re-ex = "D7^~7^s+x/i^ei- Therefore, we want to minimize 

If we differentiate E, we can see that the minimum is reached when — ^~^ (^of ^^-^s-\-x)'^ 

(Dfl%\xr = 0, that is, -{Dfrei-S + X)^ + 2AX\Df,,i-S + X)-mX^ = 0, 
or 

7X3+21(I?/,ei - S)X^ - 3(i?Aei - S)^X ~ {Df.^i - Sf = 0. 

The only positive solution to this equation is X = c(I?/rei — S), and therefore the 
minimum is reached for this value of X, and then f re-ex — iqrj/rei- 

When X > S, re-executing each task is the best strategy to minimize the energy 
consumption, and that corresponds to the case D > j^- The re-execution speed 

may then be lower than j^frei- Therefore, it may happen that /inf.i > f re-ex for 
some task T^. However, even with a tighter deadline, it would be better to re-execute 
at speed j^/rei rather than to execute it only once at speed frei- Therefore, since 
/inf.j < iq^/rei, it is Optimal to re-execute Ti, at the lowest possible speed, i.e., /inf.i- 
Note that this changes the value of f re-ex, and the call to COMPUTE_V^(y) (see Algo- 
rithm [T]i returns tasks that are executed at f inf.i, together with the re-execution speed 
for all the other tasks. 



Case 2 (p > 2). Similarly, we want to show that, in this case, the minimum energy 
is reached when the total weight of the subset of tasks that are replicated is exactly 
Df rel — S. Let I = {i \ Ti is executed twice in the solution}, and let X — J^iei 
We saw in Corollary [T] that the energy consumption cannot be lower than {S — 

^)/rei+2X/^g_gj, where /re-ex = ^^g^_x /rei- Therefore, we want to minimize 

E{X) = {S- X)fl, + 2X [ ^fj_s+x freiy- 

If we differentiate E, we can see that the minimum is reached when 

6X^ 

^ {Df rel ^S + X)^' {Df rel ~ S + X^ ~ ' 

that is, -{Df rel -S + Xf + QX^{Df rel - S" + X) - 4^3 = 0, or 

X^ + i{Df rel - S)X^ - 3{Dfrel - SfX - {Dfrel - Sf = 0. 

The only positive solution to this equation \s X = Dfrei — S, and therefore the 
minimum is reached for this value of X, and then f re-ex ~ ^/rei- 

When X > S, replicating each task is the best strategy to minimize the energy 
consumption, and that corresponds to the case D > j^. Similarly to Case 1, it is easy 

to see that each task should be replicated, even if /inf.i > f re-ex, since /inf,i < 5/rei- 
The optimal solution can also be obtained with a call to COMPUTE_Vi(l^). □ 



3.2 FPTAS for Tri-Crit-Chain 

We derive in this section a fully polynomial time approximation scheme (FPTAS) for 
Tri-Crit-Chain, based on the FPTAS for SUBSET-SUM [TOl, and the results of Sec- 
tion [TT] Without loss of generality, we use the term replication for either re-execution 
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or replication, since both scenarios have akeady been clearly identified. The prob- 
lem consists in identifying the set of replicated tasks Vr, and then the optimal solution 
can be derived from Corollary [T] it depends only on the total weight of these tasks, 
J2t ev denoted in the following as w{Vr). 

Note that we do not account in this section for /inf,i or /nii„ for readability rea- 
sons: /inf,i can usually be neglected because XoWi/ f is supposed to be very small 
whatever /, and /,„i„ simply adds subcases to the proofs (rather than an execution at 
speed /, the speed should be max(/, /min))- 

First we introduce a few preliminary functions in Algorithm|2] and we exhibit their 
properties. These are the basis of the approximation algorithm. 

When D > j^, X-OPT(y, D, p) returns the optimal value for the weight w{Vr) of 
the subset of replicated tasks Vr, i.e., the value that minimizes the energy consumption 
for Tri-Crit-Chain. The optimality comes directly from the proof of Theorem[T] 

Given a value X, which corresponds to w{Vr), Energy(V, X) returns the 
optimal energy consumption when a subset of tasks Vr is replicated. 

Then, the function Trim(L, e, X) trims a sorted list L — [Lo, • • • , im-i] in time 0{m 
given L and e. L is sorted into non decreasing order The function returns a trimmed 
list, where two consecutive elements differ from at least a factor (1 + e), except the 
last element, that is the smallest element of L strictly greater than X. This trimming 
procedure is quite similar to that used for SUB SET- SUM |10|, except that the latter 
keeps only elements lower than X. Indeed, SUBSET-SUM can be expressed as fol- 
lows: given n strictly positive integers ai, . . . , a„, and a positive integer X, we wish 
to find a subset / of {1, . . . , n} such that X^ie/ ^.i is as large as possible, but not larger 
than X. In our case, the optimal solution may be obtained either by approaching X by 
below or by above. 

Finally, the approximation algorithm is APPROX-CHAlN(y, e) (see Algo- 
rithm|2]l, where < e < 1, and it returns an energy consumption E that is not greater 
than (1 + e) times the optimal energy consumption. Note that if L = [io, ■ • • , im-i], 
then Add-List(L, x) adds element x at the end of list L (i.e., it returns the list [Lq, • • • i 
i + It; is the list [Lq + w,. . . , Lm-i + w]; and Merge-Lists(L, L') is merging two 
sorted lists (and returns a sorted list). 

We now prove that this approximation scheme is an FPTAS: 

Theorem 2. Approx-Chain is a fully polynomial time approximation scheme for 
Tri-Crit-Chain. 

Proof. We assume that 

• if n = 1, then < D < 5-?^; 

/rel C /.-el /rel 

• if » > 2, then < D < 2-^; 

Otherwise the optimal solution is obtained in polynomial time (see Theoremyji. 

Let/inf = {V C V I w{V') < X-OPT(F,i:>,p)},and/sup = {V" C V \ w{V") > 
X-Opt{V, D,p)}. Note that 1^1 is not empty, since e I-mi- 

First we characterize the solution with the following lemma: 

Lemma 4. Suppose D > -p-. Then in the solution of Tri-Crit-Chain, the subset 
of replicated tasks Vr is either an element V' G /inf such that w{V') is maximum, or 
an element V" £ Igup such that w{V") is minimum. 

Proof. Recall first that according to Proposition [T] the energy consumption of a linear 
chain is not dependent on the number of tasks replicated, but only on the sum of their 
weights. 



11 



Algorithm 2: Approximation algorithm for Tri-Crit-Chain. 
function X-OPT(y, D,p) 
begin 

\ip=l then return c{Dfrei — S); 

else return Dfrei — S; 
function ENERGY(y, D, p, X) 
begin 

if p=l then return {S-X)f'^^^ + 2X (^max (^f^in, ofj^s+x frei)) 

else return {S - X)f^^^ + 2X (max (^fmin, ofre^s+x frei) ) ; 

function Trim(L, e, X) 
begin 

m = \L\; L = [Lq, . . . , Lm-i]; L' = [Lq]; last = Lq\ 
for i = 1 to TO — 1 do 

if (last < X and Li > X) or Li > last x (1 + e) then 
\_ L' = Add-List(L',Ls); last = L,; 

return L'; 

function AppROX-CHAiN(t/, D, p, e) 
begin 

X = [X-OPT(l/,D,p)J;n= = [0]; 

for i = 1 to n do 

L« =Merge-Lists(L('-i),L(*-i) +Wi); 
_ L« = TRlM(iW,e/(28 X 2n),X); 

Let yi < ^2 be the two largest elements of L^"); 
_ return min(ENERGY(l/, D,p, Fi), Energy (V^, D,p, Y2)); 



12 



Then the lemma is obvious by convexity of the functions, and since X-Opt returns 
the optimal value of w{Vr), the weight of the replicated tasks. Therefore, the closest 
the weight of the set of replicated tasks is to the optimal weight, the better the solution 
is. Finally, any element in /i„f is a solution (since we have a solution for X-Opt), and 
if the minimal element (if it exists) of I^up is not a solution, (/re-ex too large because 
of time constraints), then no element of /gup can be a better solution. □ 

We are now ready to prove Theorem^ Let Xi — maxviei^^i 'w{Vi), and X2 = 
maxv2€isi^ '"^(^2)- Thanks to Lemma|4] the optimal set of replicated tasks Vo is such 
that Xo = w(Vo) = Xi or Xq = X2- The corresponding energy consumption is 
(Corollary [B: 



The solution returned by Approx-Chain corresponds either to Yi or to Y2, where 
Yi and Y2 are the two largest elements of the trimmed list. We first prove that at least 
one of these two elements, denoted Xa, is such that Xa < Xo < (1 + £')Xa, where 



Existence of Xa such that Xa < Xo < {I + £')Xa. We differentiate two cases. 

(a) If Y2 > X, then Yi is the value obtained by the FPTAS for SUBSET-SUM fl^ 

with the approximation ratio e', since it is the largest value not greater than X, 
and our algorithm is identical for such values. Moreover, note that Xi is the 
optimal solution of SUBSET-SUM by definition, and therefore Yi < Xi < 
(1 + e')Yi. If Xo ~ Xi, the value Xa = Yi satisfies the property. 

If Xo — X2, we prove that the property remains valid, by considering the 
SUBSET-SUM problem with a bound X2 instead of X. Then, since Y2 > X, we 
have Y2 > X2 by definition of X2- Moreover, Approx-Chain is not removing 
any element of the list greater than Y2, and therefore all elements between X 
and X2 are kept, similarly to the FPTAS for SUBSET-SUM. If Y2 ^ X2, then 
Xa = Y2 satisfies the property. Otherwise, Yi is the result of the FPTAS for 
SUBSET-SUM with a bound X2, whose optimal solution is X2, and therefore 
Yi is such that Yi < X2 < (1 + e')Yi; Xa — Yi satisfies the property. 

(b) If Y2 < X, no elements greater than X have been removed from the lists, and 

Approx-Chain has been identical to the FPTAS for SUBSET-SUM. Then, 
Xa = Y2 is the solution, that is valid both for SUBSET-SUM applied with the 
original bound X (optimal solution Xi), and with the modified bound X2 (opti- 
mal solution X2). Therefore, Y2 < Xi < (1 +e')Y2 and Y2 < X2 < (1 + £')^2, 
which concludes the proof. 

We have shown that there always is Xa (either Yi or Y2) such that Xa < Xo < 
(1 + e')Xa- Next, we show that the energy Ea obtained with this value Xa is such that 

Eopt < Ea < {I + e)Eopt- 

Approximation ratio on the energy: Ea < {I + e)Eopf Let us consider first that 
p>2. Then we have i?a = {S~Xa)f^ei+ (of ^-s+x yi frei- Re-using the previous 



'opt ~ 



( 
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^" < 5 - + 7— — y Then, this can be 



inequaUties on Xa, we obtain: -f^ < S — -fj^ + —— ° x 

rewritten so that Eopt appears: 



+ (! + £') 



I\2 



2^3 



{{1 +s')iDf rel -S)+Xoy 



<{{S-Xo)+e'S) 



/rel 



/\2 



2X^ 



[Dfrei ~S + X„)2 
< {{S-Xo)+e'S) 



+ [{l+er{^-{S-X^)) 

\ Jrel 
J rel 

-{{l + e'f-l){S~Xo)+e'S 

Jrel 

The case p = 1 leads to the same inequahty; the only difference is in the energy Ea, 
where 2X^ is replaced by {2Xa)'^, and the same difference holds for Eopt i^X^ is 
replaced by {2Xo)^). 

Finally, note that with no reliability constraints, each task is executed only once at 
speed S/D, and therefore the energy consumption is at least Eopt > Sjj^. Moreover, 
by hypothesis, D < ^ (for p > 1). Therefore, S < ^^^Ei and 4^ < (1 + 

We conclude that 



Eopt 



< 1 + 27e' + < 1 + 28e' = l + e. 



Conclusion. The energy consumption returned by Approx-Chain, denoted as 
Eaigo, is such that Eaigo < Ea, since we take the minimum out of the consumption 
obtained for Yi or Y2, and Xa is either Yi or ¥2- Therefore, Eaigo < (1 + £)Eopt- 

It is clear that the algorithm is polynomial both in the size of the instance and 
in i, given that the trimming function and Approx-Chain have the same complexity 
as in the original approximation scheme for SUBSET-SUM (see lITOl ). and all other 
operations are polynomial in the problem size (X-Opt, Energy). □ 



4 Independent tasks 

In this section, we focus on the problem of scheduling independent tasks, Tri-Crit- 
Indep. Similarly to Tri-Crit-Chain, we know that Tri-Crit-Indep is NP-hard, 
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even on a single processor. We first prove in Section 143] that there exists no constant 
factor approximation algorithm for this problem, unless P=NP. We discuss and char- 
acterize solutions to Tri-Crit-Indep in Section [421 while highlighting the intrinsic 
difficulty of the problem. The core result is a constant factor approximation algorithm 
with a relaxation on the constraint on the makespan (Section l43T l. 

4.1 Inapproximability of Tri-Crit-Indep 

Lemma 5. For all A > 1, there does not exist any \- approximation of Tri-Crit-1n- 
DEP, unless P = NP. 

Proof. Let us assume that there is a A-approximation algorithm for Tri-Crit-In- 
DEP. We consider an instance Ii of 2-Partition: given n strictly positive integers 
fli, . . . , a„, does there exist a subset / of {1, . . . , n} such that X)ie/ ~ '^i^ 
Let S = YJi=i O'i- 

We build the following instance I2 of our problem. We have n independent tasks Ti 
to be mapped on p = 2 processors, and: 

• task Ti has a weight Wi — ai, 

* fm\n ,/rel fmax 

' D = l. 

We use the A-approximation algorithm to solve I2, and the solution of the algorithm 
Eaigo is such that Eaigo < ^Eopt, where Eopt is the optimal solution. We consider the 
two following cases. 

(i) If the A-approximation algorithm returns a solution, then necessary all tasks are 
executed exactly once at speed /max, since X]r=i^«//max = 2 and there are two 
processors. Moreover, because of the makespan constraint, the load on each processor 
is equal. Let / be the indices of the tasks executed on the first processor We have 
X^ie/ ~ Si^/ '^i' therefore / is also a solution to Ii. 

(ii) If the A-approximation algorithm does not return a solution, then there is no solution 
to Ii. Otherwise, if / is a solution to Ii, there is a solution to I2 such that tasks of / 
are executed on the first processor, and the other tasks are executed on the second 
processor. Since Eaigo < ^Egpt, the approximation algorithm should have returned a 
valid solution. 

Therefore, the result of the algorithm for I2 allows us to conclude in polynomial 
time whether there is a solution to the instance Ii of 2-Partition or not. Since 2- 
Partition is NP-complete (121, the inapproximability result is true unless P=NP. □ 

4.2 Characterization 

As discussed in Section [T] the problem of scheduling independent tasks is usually 
close to a problem of load balancing, and can be efficiently approximated for vari- 
ous mono-criterion versions of the problem (minimizing the makespan or the energy, 
for instance). However, the tri-criteria problem turns out to be much harder, and cannot 
be approximated, as seen in Section l4n even when reliability is not a constraint. 

Adding reliability further complicates the problem, since we no longer have the 
property that on each processor, there is a constant execution speed for the tasks exe- 
cuted on this processor. Indeed, some processors may process both tasks that are not 
replicated (or re-executed), hence at speed /rei, and replicated tasks at a slower speed. 
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Similarly to Section 13721 we use the term replication for either re-execution or replica- 
tion; if a task is replicated, it means it is executed two times, and it appears two times 
in the load of processors, be it the same processor or two distinct processors. 

Furthermore, contrary to the Tri-Crit-Chain problem, we do not always have 
the same execution speed for both executions of a task, as in Lemma|3] 

Proposition 2. In an optimal solution of Tri-Crit-Indep, a task Ti is executed 
twice: 

• if both executions are on the same processor, then both are executed at the same 
speed, lower than -^f^^j; 

• however, when the two executions of this task are on distinct processors, then 
they are not necessarily executed at the same speed. Furthermore, one of the two 
speeds can be greater than -^f^^i. 

Moreover, we have Wi < '^^frei- 

Proof. We start by proving the properties on the speeds. When both executions occur 
on the same processor, this property was shown by [4.1 : a single execution at speed /^ei 
leads to a better energy consumption (and a lower execution time). 

In the case of distinct processors, we give an example in which the optimal solution 
uses different speeds for a replicated task, with one speed greater than --i=/j-ei. Note 

that one of the speeds is necessary lower than "^/rei, otherwise a solution with only 
one execution of this task at speed /^ei would be better, similarly to the case with 
re-execution. 

Consider a problem instance with two processors, /j-ei = /max, D — and 
three tasks such that wi = 5, W2 = 3, and W3 = 1. Because of the time constraints, Ti 
and T2 are necessarily executed on two distinct processors, and neither of them can be 
re-executed on its processor. The problem consists in scheduling task to minimize 
the energy consumption. There are three possibilities: 

• T3 is executed only once on any of the processors, at speed /^ei = /max; 

• T3 is executed twice on the same processor; it is executed on the same processor 
than T2, hence having an execution time of D— — -j^, and therefore both 



executions are done at a speed 3^ /max; 



• Ta is executed once on the same processor than Ti at a speed j^/max, and once 
on the other processor at a speed g^/max- 
It is easy to see that the minimum energy consumption is obtained with the last solution, 
and that j^/max > 75 /rei, hence the result. 

Finally, note that since at least one of the executions of the task should be at a 
speed lower than --i=/j-ei, and since the deadline is D, in order to match the deadline, 

the weight of the replicated task has to be strictly lower than -^Df rei- D 

Because of this proposition, usual load balancing algorithms are likely to fail, since 
processors handling only non-replicated tasks should have a much higher load, and 
speeds of replicated tasks may be very different from one processor to another in the 
optimal solution. 

We now derive lower bounds on the energy consumption, that will be useful to 
design an approximation algorithm in the next section. 

Proposition 3 (Lower bound without reliability). The optimal solution of Tri-Crit- 
Indep cannot have an energy lower than j^jyi- 
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Proof. Let us consider the problem of minimizing the energy consumption, with a 
deadline constraint D, but without accounting for the constraint on reliability. A lower 
bound is obtained if the load on each processor is exactly equal to and the speed of 
each processor is constant and equal to The corresponding energy consumption is 

^ (pi)) ' hence the bound. □ 

However, if the speed is small compared to /rei, the bound is very optimistic 
since reliability constraints are not matched at all. Indeed, replication must be used in 
such a case. We investigate bounds that account for replication in the following, using 
the optimal solution of the Tri-Crit-Chain problem. 

Proposition 4 (Lower bound using linear chains). For the Tri-Crit-Indep problem, 
the optimal solution cannot have an energy lower than the optimal solution to the Tri- 
Crit-Chain problem on a single processor with a deadline pD, where the weight of 
the re-executed tasks is lower than '^^frei- 

Proof. We can transform any solution to the Tri-Crit-Indep problem into a solu- 
tion to the Tri-Crit-Chain problem with deadline pD and a single processor. Tasks 
are arbitrarily ordered as a linear chain, and the solution uses the same number of 
executions and the same speed(s) for each task. It is easy to see that the Tri-Crit- 
Indep problem is more constrained, since the deadline on each processor must be 
enforced. The constraint on the weights of the re-executed tasks comes from Proposi- 
tion |2] Therefore, the solution to the Tri-Crit-Chain problem is a lower bound for 
Tri-Crit-Indep. □ 

The optimal solution may however be far from this bound, since we do not know if 
the tasks that are re-executed on a chain with a long deadline pD can be executed at the 
same speed when the deadline is D. The constraint on the weight of the re-executed 
tasks allows us to improve slightly the bound, and this lower bound is the basis of the 
approximation algorithm that we design for Tri-Crit-Indep. 



4.3 Approximation algorithm for Tri-Crit-Indep 

We have seen in Section 14.11 that there exists no constant factor approximation algo- 
rithm for Tri-Crit-Indep, unless P=NP, even without accounting for the reliability 
constraint. This is due to the constraint on the makespan and the maximum speed /max- 
Therefore, in order to provide a constant factor approximation algorithm, we relax 
the constraint on the makespan and propose an {a, /3) -approximation algorithm. The 
solution Eaigo is such that Eaigo < a x Egpt, where Eopt is the optimal solution 
with the deadline constraint D, and the makespan of the algorithm Maigo is such that 
Malgo <l3xD. 

The result of Section HTTI means that for all a > 1, there is no {a, l)-approxi- 
mation algorithm for Tri-Crit-Indep, unless P = NP. Therefore, we present an 
algorithm that realizes a (1 + -p-, /3)-approximation, where the minimum relaxation on 
the deadline is smaller than 2. It is of course possible to run the algorithm with larger 
values of (3, leading to a better guarantee on the energy consumption. 

Sketch of the algorithm. In the first step of the algorithm, we schedule each task 
with a big weight alone on one processor, with no replication. A task Ti is considered 
as big if Wi > max( — , I?/rei)- This step is done in polynomial time: we sort the tasks 
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(c) Greedy algorithm to schedule the new tasks 

Figure 1 : I 1 + , /3 ) -approximation algorithm for independent tasks 



by non-increasing weights, and then we check whether the current task is such that 
Wi > max(-|, Dfj-ei). If it is the case, we schedule the task alone on a processor and 
we let 5* = 5* — Wi and p = p — 1. The procedure ends when the current task is small 
enough, i.e., all remaining tasks are such that Wi < max(^, -D/rei), with the updated 
values of S and p. 

• If 5 > pDf^ei, i-S-, the load is large enough, we do not use replication, but 
we schedule the tasks at speed using a simple scheduling heuristic, Dec- 
REASlNG-FlRST-FlT ifTSll . Tasks are sorted by non increasing weights, and 
at each time step, we schedule the current task on the least loaded processor 
Thanks to the lower bound of Proposition [3j the energy consumption is not 
greater than the optimal energy consumption, and we determine /? such that the 
deadline is enforced. 

• If S* < pDfrei, the previous bound is not good enough, and therefore we use the 
FPTAS on a linear chain of tasks with deadline pD for Tri-Crit-Chain (see 
TheoremUll. The FPTAS is called with 

(^'^min I /min \ 1 \ 
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where Wmin = niini<i<„ Wi. Note that it is sHghtly modified so that only tasks 
of weight w < --i=£)/rei can be replicated, and that we enforce a minimum 
speed /min- The FPTAS therefore determines which tasks should be executed 
twice, and it fixes all execution speeds. 

We then use Decreasing-First-Fit in order to map the tasks onto the p pro- 
cessors, at the speeds determined earlier The new set of tasks includes both ex- 
ecutions in case of replication, and tasks are sorted by non increasing execution 
times (since all speeds are fixed). At each time step, we schedule the current task 
on the least loaded processor If some tasks cannot fit in one processor within 
the deadline /3D, we re-execute them at speed ^ on two processors. Thanks to 
the lower bound of Proposition]?] we can bound the energy consumption in this 
case. 

We illustrate the algorithm on an example in Figure [1] where eleven tasks must be 
mapped on six processors. For each task, we represent its execution speed as its height, 
and its execution time as its width. There are two big tasks, of weights wi and W2, 
that are each mapped on a distinct processor Then, we have p = A and we call 
Approx-Chain with deadline AD; tasks Tg and Tg are repHcated. Finally, Dec- 
REASlNG-FlRST-FlT greedily maps all instances of the tasks, slightly exceeding the 
original bound D, but all tasks fit within the extended deadline. 

This algorithm leads to the following theorem: 



Theorem 3. For the problem Tri-Crit-Indep, there are ^1 + -p-, /3j -approximation 
algorithms, for all /S > 2 — Q{^), that run in polynomial time. 

Before proving Theorem[3] we give some preliminary results: we prove below the 
optimality of the first step of the algorithm, i.e., the optimal solution would schedule 
tasks of weight greater than max(^, -D/rei) alone on a processor: 

Proposition 5. In any optimal solution to Tri-Crit-Indep, each task Ti such that 
Wi > max( ^ , Dfrei ) is executed only once, and it is alone on its processor. 

Proof. Let us prove the result by contradiction. Suppose that there exists a task Ti such 
that Wi > max(|^, _D/rei), and that this task is executed on processor pi. Suppose 
also that there is another task Tj executed on pi, with Wj < Wi. Necessarily, there 
exists a processor, say p2, whose load is smaller than ^, since the load of pi is strictly 
greater than ^. Consider the energy of the tasks executed on processors pi and p2. 
Because of the convexity of the energy function, it is strictly better to execute task Tj on 
processor and then Ti is executed alone on processor pi, at a speed ^ > /rei- D 

Next, we prove a lemma that will allow us to tackle the case where the load is large 
enough (S > p-D/rei)^ and we obtain a minimum on the approximation ratio of the 
deadline /3. 

Lemma 6. For the problem Tri-Crit-Indep where each task Ti is such that Wi < 
max{ — ,Df re i), scheduling each task only once at speed max(/^ei, -fj) with the 
DeCREASING-First-Fit heuristic leads to a makespan of at most /sh, with (3 — 



Note that we introduce max(^, Df^ei) since the lemma is also used in the case 
S < pDfrei- Also, since /? is increasing with p and the bound is computed in fact for 
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a number of processors smaller than the original one (some processors are dedicated to 
big tasks), the value of /3 computed with the total number of processors p is not smaller 
and it is possible to achieve a makespan of at most /3D. 

Proof. Let Zjff be the maximal load of the processors after applying Decreasing- 
FlRST-FlT on the weights of the tasks. Let us find f3 such that ^dff^ < PO: this 
means that within a time /3D, we can schedule all tasks at speed and therefore 

at speed max(/j-ei, since the most loaded processor succeeds to be within the 
deadline j3D. 

Let Zopt be the maximal load of the processors in an optimal solution, and let Ti be 
the last task executed on the processor with the maximal load l^ff by Decreasing- 
FlRST-FlT. We have either Wi < /opt/3 or Wi > Zopt/3. 

• If < Zopt/3, we know that Zopt < Idff <(^ — -^ lopi, since Decreasing-First- 

FlT is a ^1 — -approximation {T3t . We want to compare ^opt to S/p (average load). 

We consider the solution of Decreasing-First-Fit. At the time when was sched- 
uled, all the processors were at least as loaded as the one on which Ti was scheduled, 
and hence we obtain a lower bound on 5: S > {p ~ l)(/dff — Wi) + Zdff- Furthermore, 
^dff - Wi > fZopt (because Us > /opt and Wi < Zopt/3). Finally, S > {p- l)|Zopt + ^opt, 
which means that Zopt < f ^f^, and /^ff < (| - 3^) ^f^f = (2 - ^) f . 

In this case, with /3 = 2— 2^^, we can execute all the tasks at speed max(/rei, ^) 
within the deadline (3D. 

• If > Zopt/3, it is known that Decreasing-First-Fit is optimal for the execu- 
tion time 1 13 1, i.e., /opt — Us, and we aim at finding an upper bound on Upt- We assume 
in the following that tasks are sorted by non increasing weights. 

Wi > ^, then we show that Ti is the only task executed on its processor (re- 
call that Ti is the last task executed on the processor with the maximal load by Dec- 
reasing-First-Fit). Indeed, there cannot be p tasks of weight not smaller than ^, 
hence i < p, and Ti is the first task scheduled on its processor Moreover, if Dec- 
REASlNG-FlRST-FlT were to schedule another task on the processor of Ti, then this 
would mean that the p ~ 1 other processors all have a load greater than Wi, and hence 
the total load would be greater than S. Then, since Wi < max(-| , I?/rei) and Wi > ^, 

we have Wi < Df^ei and we can execute each task at speed /j-ei = max(/rei, ^) 
within a deadline D. Indeed, the maximal load is then Wi, by definition of Ti. There- 
fore, the result holds (with (3 = 1). 

Now suppose that Wi < ^. In that case, if Ti was the only task executed on 
its processor, then we would have Upt — Us < ^, which is impossible since S — 
^fc — P^opt- Therefore, Ti is not the only task executed on its processor. A direct 
consequence of this fact is thatp+ 1 < z. Indeed, Decreasing-First-Fit schedules 
the p largest tasks on p distinct processors; since Ti is the last task scheduled on its 
processor, but not the only one, then Ti is not among the p first scheduled tasks. Also, 
there are only two tasks on the processor executing Ti, since Wi > /opt/3 and the tasks 
scheduled before Ti have a weight at least equal to Wi . Finally, p + 1 < i < 2p. 

After scheduHng task Tj on processor j for I < j < p, Decreasing-First- 
FlT schedules task Tp^j on processor p — j + 1 for I < j < i — p, and Ti is 
therefore scheduled on processor p2p-i+i, together with task T2p-i+i, and we have 
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Wi + W2p-i+i — /opt- Note that because the wj are sorted, S > J2j<i '^j — We 
also have W2p-i+i < ^: indeed, when Ti was scheduled, the load of the p processors 
was at least equal to the load of the processor where T2p+i-i was scheduled. Hence, 
W2p-t+i cannot be greater than |. Then, since W2p-i+i = /opt - m, m > kpi - |, 
and finally /opt - § <Wi< j. 

In order to find an upper bound on /opt, we provide a lower bound to S, as a function 
of Wj : 



n i 2p— i+1 i 

>{2p-i + l)w2p-i+i + (2(i - _p) - 
= (2p - i + l)(/opt - Wi) + (2(i - p) - l)wi 
= {2p-i + l)/opt + (3i - 4p - 2)it;i = /(w^). 

We then have f'{wi) = 3z — 4p — 2, and we consider two cases. 
If I'{wi) > 0, then we have i > and finally S > iwi > ^ (/opt - | 

We can conclude that /opt < f (l + iffe) = f (2 - |±|) • 

Otherwise, f'{wi) < and / is a decreasing function of Wi, i.e., its minimum is 
reached when Wi is maximal, and S > /(-f ). Hence, S > {2p — i + l)/opt + (3i — 
4p - 2)f . Since j < 2p, 2p - i + 1 > and 

5 /^^-3^ + 4p + 2^^ 2S 



Finally, since i > p + 1, /opt < ^ = f (2 - ^ 
Overall, if Wi > /opt/3, we have the bound 

S P + 2 „ 

/opt < — X max 2 - -, 2 

p \ 4p + 2 

Therefore, for /3 > max ^2 — ^j^, 2 — we can execute all the tasks on the 

processor of maximal load (and hence all the tasks) at speed max(/i-ei, ^) within the 
deadline /3_D in the case Wi > /opt/3. 

We can now conclude the proof of Lemma|6]by saying that for j3 = max ^2 — , ^ ^p^2 ' ^ ^J+T 

i.e., /3 = max (2 — 2p+i 1 2 — scheduling each task only once at speed max(/rei, ^) 

with the Decreasing-First-Fit heuristic leads to a makespan of at most /3_D. □ 

We are now ready to prove Theorem|3] 

Proof of Theorem |3l First, thanks to Proposition |5] we know that the first step of 
the algorithm takes decisions that are identical to the optimal solution, and there- 
fore these tasks that are executed once, alone on their processor, have the same en- 
ergy consumption than the optimal solution and the same deadline. We can therefore 
safely ignore them in the remaining of the proof, and consider that for each task Ti, 
Wi < max(|,D/rei). 



,2-^±%,2-^ 
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In the case where S > pD/rei, we use the fact that S{^)'^ is a lower bound on 
the energy (Proposition^. Each task is executed once at speed max(/rei, ^) = 
and therefore the energy consumption is equal to the lower bound S{^)'^. The bound 
on the deadline is obtained by applying Lemma|6] 

We now focus on the case S < pDf^ei- Therefore, in the following, max(^, /rei) 
/rei- The algorithm runs the FPTAS on a linear chain of tasks with deadline pD, and 
e as defined in Equation (|2]i. The FPTAS returns a solution on the linear chain with 
an energy consumption Sfptas such that i?FPTAS < (1 + s)^ -E-chain, where E^hain is the 
optimal energy consumption for Tri-Crit-Chain with deadline pD on a single pro- 
cessor According to Proposition [H since the solution for the linear chain is a lower 
bound, the optimal solution of Tri-Crit-Indep is such that Eopt > -Bchain- 

For each task Ti, let f^^'-^'" be the speed of its execution returned by the FPTAS for 
Tri-Crit-Chain. Note that in case of re-execution, then both executions occur at the 
same speed (LemmaO. We now consider the Tri-Crit-Indep problem with the set 
of tasks V: for each task Ti,Ti G V and its weight is Wi = Wij^i^; moreover, if Ti is 

re-executed, we add two copies of Ti in V. Then, J2f ev ~ definition of 

the solution of Tri-Crit-Chain. 

Let /3 — max(2 — ^^^-^ , 2 — be the relaxation on the deadline that we have 

from Lemma |6] The goal is to map all the tasks of V at speed /^ei within the dead- 
line (3D, which amounts at mapping the original tasks at the speeds assigned by the 
FPTAS: 

• If there are tasks Ti such that > f3D, we execute them at speed ^ alone on 
their processor, so that they reach exactly the deadline f3D. Note that in this case, 
the energy consumption of the algorithm becomes greater than -Efptas, since we 
execute these tasks faster than the FPTAS to fit on the processor 

• Tasks Ti such that D < < j3D are executed alone on their processor at 
speed /rei. 

• For the remaining tasks and processors, we use Decreasing-First-Fit as in 
Lemma|6l Since the previous tasks take a time of at least D in the solution of the 
FPTAS, and they are mapped alone on a processor, we can safely remove them 
and apply the lemma. Note that the number of processors may now be smaller 
than p, hence leading to a smaller bound /3. 

In the end, all tasks are mapped within the deadline pD (where /3 is computed with 
the original number of processors). There remains to check the energy consumption of 
the solution returned by this algorithm. 

If all tasks are such that < (30/^^1, Eaigo = £^fptas < (1 + E^hmn < 
{1 + efEopf 

According to Equation (O, e < and therefore 

Ealgo < + ^ + ^) ^opt < + ^52) ^"Pt- 

Otherwise, let V' be the set of tasks such that Wi > (3 Df rei- For T^ e V', 
Wi > PDff^'^™. Since Wi < I? /rei (larger tasks have been processed in the first step 
of the algorithm), we have /^'^ha™ < /rei. This means that Ti belongs to the set of 
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the tasks that are re-executed by the FPTAS. Hence, since we enforced an additional 
constraint, we have Wi < "^^/rei- The least energy consumed for this task by any 
solution to Tri-Crit-Indep is therefore obtained when re-executing task Ti on two 
distinct processors at speed jj-, in order to fit within the deadline D. Task Ti appears 
two times in V, and we let E be the minimum energy consumption required in the 
optimal solution for tasks ofV':E = X^t e v' (Tf) ■ 

The algorithm leads to the same energy consumption as the FPTAS except for the 
tasks of V' that are removed from the set X of replicated tasks, and that are executed 
at speed jf^: 



2 

re-ex 



2 

Since £;fptas ^ (S - X)f'^^^ + 2Xf^^_^^, we obtain 



Ealgo — -EpPTAS + JI E ~ X]f,eV' "^ifre-ey.- 

Furthermore, E < Eopt since it considers only the optimal energy consumption 
of a subset of tasks. We have i^ppxAS < (1 + £)^Eopt, and from Proposition [1] it is 
easy to see that -Epftas < Sif'^^^, i.e., E'pptas is smaller than the energy of every task 
executed once at speed /rei- Hence, -EppxAS < min(i?opt, Sf'^^i), and since 

£ < 1, (1 + e)2 < 1 + 3£. Finally, E'ppxas < Eopt + ^eSf^^j^. Thanks to Equation ©, 
Ss^/^ei < 2wminfl^in < Y.f,eV' "^i/re-ex ("ote that there are at least two tasks 
in V' , since tasks are duplicated). 

Finally, reporting in the expression of Eaigo, 

Ealgo < Eopt+ 3eS'/^g^+ -^Eopt — Tlifif^V' ''^if re-ex 



< I 1 + ^ j Eopt ■ 



To conclude, we point out that this algorithm is polynomial in the size of the input 
and in i . □ 

6 

We can improve the approximation ratio on the energy for large values of p. The 
idea is to avoid the case in which tasks are replicated by the chain but are not fitting 
within PD because the speed at which they are re-executed is too small. To do so, we 

fix a value e* ~ Q , such that < e* < 1 forp > 24. The variant of the algorithm 
is used only when p > 24 (after scheduling the big tasks). The algorithm decides 
that the load is large enough when S > P-D/rei i^^, . leading to a ((1 + e*)'^,f3)- 
approximation in this case. In the other case (5 < P-D/rei i^^, )^ it is possible to 
prove that when there are tasks such that > /3D, then necessarily all tasks are 
re-executed. Next we apply Theorem [T] while fixing values for the /inf,i's, so as to 
obtain in polynomial time the optimal solution with new execution speeds, that can all 
be scheduled within /3D using Lemma|6] Details can be found in the appendix. 

5 Conclusion 

In this paper, we have designed efficient approximation algorithms for the tri-criteria 
energy /reliability /makespan problem, using replication and re-execution to increase the 
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reliability, and dynamic voltage and frequency scaling to decrease the energy consump- 
tion. Because of the antagonistic relation between processor speeds and reliability, this 
tri-criteria problem is much more challenging than the standard bi-criteria problem, 
which aims at minimizing the energy consumption with a bound on the makespan, 
without accounting for a constraint on the reliability of tasks. 

We have tackled two classes of applications. For linear chains of tasks, we propose 
a fully polynomial time approximation scheme. However, we show that there exists no 
constant factor approximation algorithm for independent tasks, unless P=NP, and we 
are able in this case to propose an approximation algorithm with a relaxation on the 
makespan constraint: with a deadline at most two times larger than the original one, 
we can approach the optimal solution for energy consumption. 

As future work, it may be possible to improve the deadline relaxation by using a 
FPTAS to schedule independent tasks Q rather than Decreasing-First-Fit ifTSll . 
Also, an open problem is to find approximation algorithms for the tri-criteria problem 
with an arbitrary graph of tasks. Even though efficient heuristics have been designed 
with re-execution of tasks (but no replication) by |4|, it is not clear how to derive 
approximation ratios from these heuristics. It would be interesting to design efficient 
algorithms using replication and re-execution for the general case, and to prove approx- 
imation ratios on these algorithms. A first step would be to tackle fork and fork-join 
graphs, inspired by the study on independent tasks. 
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Appendix : (l + 6(^),2 — 6(^)) -approximation algorithm 
for Tri-Crit-Indep 

This algorithm is used only forp > 24, and we define: 

1 



K = 1 



;(2/3v^-l)' 



y/2cpK - 1 ■ 

Recall that /? = max(2 — 15^^, 2 — ^^)- The value /3 is therefore increasing 
with p, and forp > 24, we have {3 > 1.9. Furthermore, c w 0.2838 and X > 0.2. 
Finally, since p > 24, < e* < 1. 

Modifications to the original algorithm. 

The handling of big tasks is identical. However, we do not use replication when S > 
pD f rel j^-r- we schedule tasks at speed max(/rei, ^) using Decreasing-First- 
FlT. Proposition |6] below shows that we obtain the desired guarantee in this case. In 
the other case (S < pDJ^ei i^^, ), we apply the FPTAS with the parameter e* . It is 
now possible to show that (i) either we can schedule all tasks with the speeds returned 
by the FPTAS within the deadline I3D\ (ii) or there is at least one task that does not 
fit, but then all tasks are re-executed and we can find an optimal solution that can be 
scheduled thanks to Theorem[T] The correction of this case is proven in Proposition!?] 



Proposition 6. For the problem Tri-Crit-Indep where each task Ti is such that 
Wi < max(^, D/^ei), '/ ~^ ^*)^ frei, then scheduling each task only once 

at speed max(/^ei, ^) with Decreasing-First-Fit is a (^{1 + e*)^ , 13^ -approxi- 
mation algorithm, with [3 — max (2 — ^j^^pj , 2 — • 



Proof. We use the fact that S{^)^ is a lower bound on the energy (Proposition^. If 

each task is executed once at speed max(/rei, since /j-ei < (1 + e)^, then the 

energy consumption is at most at a ratio (1+e*)^ of the value of the optimal energy 
consumption. The bound on the deadline is obtained by applying Lemma|6] □ 

Proposition 7. For the problem Tri-Crit-Indep where each task Ti is such that 
Wi < max{^, Df rel), if S < pDfrei i^^, , then there is a (j^l + e*)^ , (3j -approxi- 
mation algorithm, with (3 = max ^2 — 2p+i i ^ ~ Ip+2j ■ 

Proof. Similarly to the original algorithm, we use the FPTAS andweobtaina ^(1 + e*Y 

approximation algorithm unless there is a task Ti such that > (3D, and hence 
> f3D. Since Wi < -D/rei (larger tasks have been processed in the first step of 

the algorithm), we have /<=''^'" < fr^i. This means that Ti belongs to the set of the 
tasks that are re-executed by Approx-Chain. Hence, since we enforced an additional 
constraint, we have Wi < -^^frei- Finally, 

/f"" - /re-ex < ^ < 7f^/-l- 
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Let Xchain be the total weight of the re-executed tasks (Xi or X2 in APPROX- 
Chain), and let Xopt = c{pDf^ei - S) be the optimal weight to solve Tri-Crit- 
Chain with one processor. We compute Xopt — ^chain- By definition of f re-ex (Corol- 
lary [U, the optimal speed at which each re-execution should occur, we have: 

„ S — ^chain , 2J("chain S — X^pt 2Xq„i 

pD = + = — + -— , 

J rel ^ re-ex 7 rel jopt 

where /opt = ^^/rei (Corollary [TJappUed to Xopt). We now express Xopt - -'^^chain: 

_1 LW -(ol±^A 

r r l^chain — \ O f f l^opti 

Ire-ex J rel / \ Jrel /rel / 

and therefore Xchain = c{2ff"-fre-e.) ^"^ finally Xopt-Xchain = (l ~ c(2/ir-/re-e,) 
that is minimized when f re-ex is maximized. Applying the upper bound on / 
from Equation (O, we obtain 

^opt ^ ^chain ( 1 , y— ~~" I ^opt = K X Xopt 



re-ex 



Since 



pD 



< TTF/-ei, we have ^ < (l - f,^,, and frei - ^ > 

Since Xopt = c{pDfrei - S) and if > 0, we obtain K x Xopt > 75 ^/rei, 
and therefore we have Xopt — -'^chain > "^^/rei • This means that each task that can be 
re-executed in any solution to Tri-Crit-Indep is indeed re-executed in the solution 
given by Approx-Chain, since all these tasks have a weight lower than -^Dfrei- 
Since Xopt is greater than the total weight of the tasks that can be re-executed, we can 
use Theorem[T]in the case p = 1, on the subset of tasks T; such that Wi < '^^frei- 
The other tasks are executed once at speed frei- We define fini.i = so that 

/inf,i < 7^^/rei < Yf^frei ^nd we can apply Theorem[T] Then, in polynomial 

~ chain 

time, we have the optimal solution with new execution speeds: fi . Furthermore for 
each task Ti, necessarily 

< — ^ 1.9D. 



- chain — f . . . 

Note that since p > 24, we have /3 > 1.9, and -per < f3D. We can therefore 

schedule the new tasks Ti within the deadline relaxation using Decreasing-First- 
FlT, as a direct consequence of Lemma|6] □ 

We can conclude by stating that thanks to Propositions|6]and|7] since e* is in 6(^) 
and /3 is in 2 — 6(^), this algorithm isa(l + 6(i),2 — 9(i))-approximation. Indeed, 
e* < 1 and therefore {1 + e* 'f < 1 -\- 3e*. 

Furthermore, the algorithm is polynomial in the size of the input and in 
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